Historical Change in the Conceptualization of Gender in English Human-Generated Texts

Nela Riddle1, Alex Tran2, Amanda Diekman2


1 Department of Computer Science, Math
2 Department of Psychological and Brain Sciences

Introduction

Stereotypes about men and women have evolved over time, reflecting both real and perceived changes in gender roles. Prior research has shown that traits historically associated with men and women have shifted due to societal transformations (Bosak et al., 2017; Diekman & Eagly, 2000; Diekman et al., 2005; Lopez-Zafra & Garcia-Retamero, 2011; Gustafsson Sendén et al., 2019; Eagly et al., 2020). While men and women are perceived to be becoming more similar in their traits, persistent gender gaps in agency and communion remain (Eagly et al., 2020). However, these findings often rely on self-report data, which can introduce observer effects and inflate gender differences or similarities.

To overcome these limitations, we leverage large-scale, naturalistic human-generated text data as a measure of gender conceptualizations over time. Written language encodes societal beliefs, attitudes, and stereotypes about men and women, providing a rich source of insight into collective social representations (Bailey et al., 2022). By analyzing word embeddings—vector representations of word meaning derived from billions of word co-occurrences—we can track the evolution of gender-trait associations from the 19th century to the present.

Guided by Social Roles Theory (Eagly, 1987; Eagly & Wood, 2012), we investigate how changes in gendered labor distributions correspond to shifts in trait perceptions. As gender segregation in social roles diminishes, we expect trait inferences about men and women to converge. To test this prediction, we incorporate national indicators of gender equality, including workforce participation (BLS data), political and civic engagement indices, and reproductive patterns such as fertility rates and contraceptive use. These metrics offer a comprehensive view of how gender roles and trait associations have evolved in tandem with societal progress.

Purpose

Our study aims to expand upon previous findings by examining stereotype change on a larger scale using naturalistic data, providing deeper insight into the long-term trajectory of gender conceptualizations.

Methods

Two corpora were used in analysis:

Corpus EngAll COHA
Long Title Google Books Ngram Corpus, All English Corpus of Historical American English
Sources Books predominantly in the English language published in any country American text from the 1820s-2010s (fiction, magazine, newspaper, non-fiction, TV/movies)
Genre-balanced? No Yes
Total Words 850 billion 410 million
Decades 1800-1999 1820-2009

The Mean Average Correlation (MAC) was the main metric used, with the following process:

  • Compile word lists (trait, job, agentic, communal)

  • Compute MAC score between each word in list and “men,” “women” for a given decade

    • A higher score implies higher similarity between the group and term
  • Plot and compute Pearson correlation

  • Measure change in correlation over time; increase suggests groups have grown more similar w.r.t. a word list

Example Decade Plot
Example Decade Plot

Conclusions

Discussion

References